Goto

Collaborating Authors

 Wilmington


Supplement WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking T able of Contents

Neural Information Processing Systems

If taking a closer look at the MedDRA classification on the system organ level on its website, we can find a claim of "System Organ Classes (SOCs) which are groupings by aetiology (e.g. However, as claimed in the original paper, "It should be noted that we did not perform any preprocessing of our datasets, such as Tab. These datasets appear in MoleculeNet as well. As mentioned in the introduction in the main paper, there are also issues with inconsistent representations and undefined stereochemistry. We list an example for each in Figure 1 and Figure 1.



LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks

Emami, Yousef, Zhou, Hao, Nabavirazani, SeyedSina, Almeida, Luis

arXiv.org Artificial Intelligence

Unmanned Aerial Vehicles (UAVs) are increasingly being utilized in various private and commercial applications, e.g., traffic control, parcel delivery, and Search and Rescue (SAR) missions. Machine Learning (ML) methods used in UAV-Assisted Sensor Networks (UASNETs) and, especially, in Deep Reinforcement Learning (DRL) face challenges such as complex and lengthy model training, gaps between simulation and reality, and low sampling efficiency, which conflict with the urgency of emergencies, such as SAR missions. In this paper, an In-Context Learning (ICL)-Data Collection Scheduling (ICLDC) system is proposed as an alternative to DRL in emergencies. The UAV collects sensory data and transmits it to a Large Language Model (LLM), which creates a task description in natural language. From this description, the UAV receives a data collection schedule that must be executed. A verifier ensures safe UAV operations by evaluating the schedules generated by the LLM and overriding unsafe schedules based on predefined rules. The system continuously adapts by incorporating feedback into the task descriptions and using this for future decisions. This method is tested against jailbreaking attacks, where the task description is manipulated to undermine network performance, highlighting the vulnerability of LLMs to such attacks. The proposed ICLDC significantly reduces cumulative packet loss compared to both the DQN and Maximum Channel Gain baselines. ICLDC presents a promising direction for intelligent scheduling and control in UASNETs.


Supplement WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking T able of Contents

Neural Information Processing Systems

If taking a closer look at the MedDRA classification on the system organ level on its website, we can find a claim of "System Organ Classes (SOCs) which are groupings by aetiology (e.g. However, as claimed in the original paper, "It should be noted that we did not perform any preprocessing of our datasets, such as Tab. These datasets appear in MoleculeNet as well. As mentioned in the introduction in the main paper, there are also issues with inconsistent representations and undefined stereochemistry. We list an example for each in Figure 1 and Figure 1.



Adaptive Protein Design Protocols and Middleware

Alsaadi, Aymen, Ash, Jonathan, Titov, Mikhail, Turilli, Matteo, Merzky, Andre, Jha, Shantenu, Khare, Sagar

arXiv.org Artificial Intelligence

Abstract--Computational protein design is experiencing a transformation driven by AI/ML. However, the range of potential protein sequences and structures is astronomically vast, even for moderately sized proteins. Hence, achieving convergence between generated and predicted structures demands substantial computational resources for sampling. The Integrated Machine-learning for Protein Structures at Scale (IMPRESS) offers methods and advanced computing systems for coupling AI to high-performance computing tasks, enabling the ability to evaluate the effectiveness of protein designs as they are developed, as well as the models and simulations used to generate data and train models. This paper introduces IMPRESS and demonstrates the development and implementation of an adaptive protein design protocol and its supporting computing infrastructure. This leads to increased consistency in the quality of protein design and enhanced throughput of protein design due to dynamic resource allocation and asynchronous workload execution.


Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding

Ding, Wanying, Narendra, Savinay, Shi, Xiran, Ratnaparkhi, Adwait, Yang, Chengrui, Sabzevar, Nikoo, Yin, Ziyan

arXiv.org Artificial Intelligence

Analyzing financial transactions is crucial for ensuring regulatory compliance, detecting fraud, and supporting decisions. The complexity of financial transaction data necessitates advanced techniques to extract meaningful insights and ensure accurate analysis. Since Transformer-based models have shown outstanding performance across multiple domains, this paper seeks to explore their potential in understanding financial transactions. This paper conducts extensive experiments to evaluate three types of Transformer models: Encoder-Only, Decoder-Only, and Encoder-Decoder models. For each type, we explore three options: pretrained LLMs, fine-tuned LLMs, and small proprietary models developed from scratch. Our analysis reveals that while LLMs, such as LLaMA3-8b, Flan-T5, and SBERT, demonstrate impressive capabilities in various natural language processing tasks, they do not significantly outperform small proprietary models in the specific context of financial transaction understanding. This phenomenon is particularly evident in terms of speed and cost efficiency. Proprietary models, tailored to the unique requirements of transaction data, exhibit faster processing times and lower operational costs, making them more suitable for real-time applications in the financial sector. Our findings highlight the importance of model selection based on domain-specific needs and underscore the potential advantages of customized proprietary models over general-purpose LLMs in specialized applications. Ultimately, we chose to implement a proprietary decoder-only model to handle the complex transactions that we previously couldn't manage. This model can help us to improve 14% transaction coverage, and save more than \$13 million annual cost.


HyMaTE: A Hybrid Mamba and Transformer Model for EHR Representation Learning

Mottalib, Md Mozaharul, Phan, Thao-Ly T., Beheshti, Rahmatollah

arXiv.org Artificial Intelligence

Electronic health Records (EHRs) have become a cornerstone in modern-day healthcare. They are a crucial part for analyzing the progression of patient health; however, their complexity, characterized by long, multivariate sequences, sparsity, and missing values poses significant challenges in traditional deep learning modeling. While Transformer-based models have demonstrated success in modeling EHR data and predicting clinical outcomes, their quadratic computational complexity and limited context length hinder their efficiency and practical applications. On the other hand, State Space Models (SSMs) like Mamba present a promising alternative offering linear-time sequence modeling and improved efficiency for handling long sequences, but focus mostly on mixing sequence-level information rather than channel-level data. To overcome these challenges, we propose HyMaTE (A Hybrid Mamba and Transformer Model for EHR Representation Learning), a novel hybrid model tailored for representing longitudinal data, combining the strengths of SSMs with advanced attention mechanisms. By testing the model on predictive tasks on multiple clinical datasets, we demonstrate HyMaTE's ability to capture an effective, richer, and more nuanced unified representation of EHR data. Additionally, the interpretability of the outcomes achieved by self-attention illustrates the effectiveness of our model as a scalable and generalizable solution for real-world healthcare applications. Codes are available at: https://github.com/healthylaife/HyMaTE.


FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring

Emami, Yousef, Zhou, Hao, Gaitan, Miguel Gutierrez, Li, Kai, Almeida, Luis

arXiv.org Artificial Intelligence

--Unmanned Aerial V ehicles (UA Vs) are vital for public safety, particularly in wildfire monitoring, where early detection minimizes environmental impact. In UA V-Assisted Wildfire Monitoring (UA WM) systems, joint optimization of sensor transmission scheduling and velocity is critical for minimizing Age of Information (AoI) from stale sensor data. Deep Reinforcement Learning (DRL) has been used for such optimization; however, its limitations such as low sampling efficiency, simulation-to-reality gaps, and complex training render it unsuitable for time-critical applications like wildfire monitoring. This paper introduces a new online Flight Resource Allocation scheme based on LLM-Enabled In-Context Learning (FRSICL) to jointly optimize the UA V's flight control and data collection schedule along the trajectory in real time, thereby asymptotically minimizing the average AoI across ground sensors. In contrast to DRL, FRSICL generates data collection schedules and controls velocity using natural language task descriptions and feedback from the environment, enabling dynamic decision-making without extensive retraining. Simulation results confirm the effectiveness of the proposed FRSICL compared to Proximal Policy Optimization (PPO) and Nearest-Neighbor baselines. Nowadays, Unmanned Aerial V ehicles (UA Vs) have a wide range of applications in public safety [1], energy [2], and environmental monitoring [3]. Public safety UA Vs serve critical roles in emergency operations, including search and rescue (SAR), wildfire surveillance, and disaster management.


Deep Learning-Based Semantic Segmentation for Real-Time Kidney Imaging and Measurements with Augmented Reality-Assisted Ultrasound

Luijten, Gijs, Scardigno, Roberto Maria, de Paiva, Lisle Faray, Hoyer, Peter, Kleesiek, Jens, Buongiorno, Domenico, Bevilacqua, Vitoantonio, Egger, Jan

arXiv.org Artificial Intelligence

Ultrasound (US) is widely accessible and radiation-free but has a steep learning curve due to its dynamic nature and non-standard imaging planes. Additionally, the constant need to shift focus between the US screen and the patient poses a challenge. To address these issues, we integrate deep learning (DL)-based semantic segmentation for real-time (RT) automated kidney volumetric measurements, which are essential for clinical assessment but are traditionally time-consuming and prone to fatigue. This automation allows clinicians to concentrate on image interpretation rather than manual measurements. Complementing DL, augmented reality (AR) enhances the usability of US by projecting the display directly into the clinician's field of view, improving ergonomics and reducing the cognitive load associated with screen-to-patient transitions. Two AR-DL-assisted US pipelines on HoloLens-2 are proposed: one streams directly via the application programming interface for a wireless setup, while the other supports any US device with video output for broader accessibility. We evaluate RT feasibility and accuracy using the Open Kidney Dataset and open-source segmentation models (nnU-Net, Segmenter, YOLO with MedSAM and LiteMedSAM). Our open-source GitHub pipeline includes model implementations, measurement algorithms, and a Wi-Fi-based streaming solution, enhancing US training and diagnostics, especially in point-of-care settings.